{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 多层感知机的从零开始实现\n",
    "\n",
    "我们已经从上一节里了解了多层感知机的原理。下面，我们一起来动手实现一个多层感知机。首先导入实现所需的包或模块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "9"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import d2lzh as d2l\n",
    "from mxnet import nd\n",
    "from mxnet.gluon import loss as gloss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 读取数据集\n",
    "\n",
    "这里继续使用Fashion-MNIST数据集。我们将使用多层感知机对图像进行分类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "2"
    }
   },
   "outputs": [],
   "source": [
    "batch_size = 256\n",
    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义模型参数\n",
    "\n",
    "我们在[“softmax回归的从零开始实现”](softmax-regression-scratch.ipynb)一节里已经介绍了，Fashion-MNIST数据集中图像形状为$28 \\times 28$，类别数为10。本节中我们依然使用长度为$28 \\times 28 = 784$的向量表示每一张图像。因此，输入个数为784，输出个数为10。实验中，我们设超参数隐藏单元个数为256。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "3"
    }
   },
   "outputs": [],
   "source": [
    "num_inputs, num_outputs, num_hiddens = 784, 10, 256\n",
    "\n",
    "W1 = nd.random.normal(scale=0.01, shape=(num_inputs, num_hiddens))\n",
    "b1 = nd.zeros(num_hiddens)\n",
    "W2 = nd.random.normal(scale=0.01, shape=(num_hiddens, num_outputs))\n",
    "b2 = nd.zeros(num_outputs)\n",
    "params = [W1, b1, W2, b2]\n",
    "\n",
    "for param in params:\n",
    "    param.attach_grad()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义激活函数\n",
    "\n",
    "这里我们使用基础的`maximum`函数来实现ReLU，而非直接调用MXNet的`relu`函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "4"
    }
   },
   "outputs": [],
   "source": [
    "def relu(X):\n",
    "    return nd.maximum(X, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义模型\n",
    "\n",
    "同softmax回归一样，我们通过`reshape`函数将每张原始图像改成长度为`num_inputs`的向量。然后我们实现上一节中多层感知机的计算表达式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "5"
    }
   },
   "outputs": [],
   "source": [
    "def net(X):\n",
    "    X = X.reshape((-1, num_inputs))\n",
    "    H = relu(nd.dot(X, W1) + b1)\n",
    "    return nd.dot(H, W2) + b2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义损失函数\n",
    "\n",
    "为了得到更好的数值稳定性，我们直接使用Gluon提供的包括softmax运算和交叉熵损失计算的函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "6"
    }
   },
   "outputs": [],
   "source": [
    "loss = gloss.SoftmaxCrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练模型\n",
    "\n",
    "训练多层感知机的步骤和[“softmax回归的从零开始实现”](softmax-regression-scratch.ipynb)一节中训练softmax回归的步骤没什么区别。我们直接调用`d2lzh`包中的`train_ch3`函数，它的实现已经在[“softmax回归的从零开始实现”](softmax-regression-scratch.ipynb)一节里介绍过。我们在这里设超参数迭代周期数为5，学习率为0.5。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "7"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 1, loss 0.7760, train acc 0.709, test acc 0.824\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 2, loss 0.4921, train acc 0.818, test acc 0.842\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 3, loss 0.4226, train acc 0.845, test acc 0.863\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 4, loss 0.3927, train acc 0.856, test acc 0.863\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 5, loss 0.3659, train acc 0.864, test acc 0.868\n"
     ]
    }
   ],
   "source": [
    "num_epochs, lr = 5, 0.5\n",
    "d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,\n",
    "              params, lr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "* 可以通过手动定义模型及其参数来实现简单的多层感知机。\n",
    "* 当多层感知机的层数较多时，本节的实现方法会显得较烦琐，如在定义模型参数的时候。\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 改变超参数`num_hiddens`的值，看看对实验结果有什么影响。\n",
    "* 试着加入一个新的隐藏层，看看对实验结果有什么影响。\n",
    "\n",
    "\n",
    "\n",
    "## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/739)\n",
    "\n",
    "![](../img/qr_mlp-scratch.svg)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}